Data storage system having multi-cast/unicast

ABSTRACT

A system interface includes a plurality of first directors, a plurality of second directors, a data transfer section and a message network. The data transfer section includes a cache memory. The cache memory is coupled to the plurality of first and second directors. The messaging network operates independently of the data transfer section and such network is coupled to the plurality of first directors and the plurality of second directors. The first and second directors control data transfer between the first directors and the second directors in response to messages passing between the first directors and the second directors through the messaging network to facilitate data transfer between first directors and the second directors. The data passes through the cache memory in the data transfer section. A method for operating a data storage system adapted to transfer data between a host computer/server and a bank of disk drives. The method includes transferring messages through a messaging network with the data being transferred between the host computer/server and the bank of disk drives through a cache memory, such message network being independent of the cache memory.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of the filing dates of the followingco-pending patent applications under the provisions of 35 U.S.C. §120:

Ser. No. 09/540,828, entitled “Data Storage System Having Separate DataTransfer Section And Message Network”, inventors Yuval Ofek, David L.Black, Stephen D. MacArthur, Richard Wheeler and Robert Thibault, filedMar. 31, 2000;

Ser. No. 09/540,825, entitled “Data Storage System Having Separate DataTransfer Section and Message Network With Plural Directors On A CommonPrinted Circuit Board And Redundant Switching Networks”, inventors DavidL. Black, Richard Wheeler, Robert Thibault, Stephen D. MacArthur andYuval Ofek, filed Mar. 31, 2000;

Ser. No. 09/539,966, entitled “Data Storage System Having Separate DataTransfer Section And Message Network With Plural Directors On A CommonPrinted Circuit Board”, inventors Stephen D. MacArthur, filed Mar. 31,2000.

BACKGROUND OF THE INVENTION

This invention relates generally to data storage systems, and moreparticularly to data storage systems having redundancy arrangements toprotect against total system failure in the event of a failure in acomponent or subassembly of the storage system.

As is known in the art, large host computers and servers (collectivelyreferred to herein as “host computer/servers”) require large capacitydata storage systems. These large computer/servers generally includesdata processors, which perform many operations on data introduced to thehost computer/server through peripherals including the data storagesystem. The results of these operations are output to peripherals,including the storage system.

One type of data storage system is a magnetic disk storage system. Herea bank of disk drives and the host computer/server are coupled togetherthrough an interface. The interface includes “front end” or hostcomputer/server controllers (or directors) and “back-end” or diskcontrollers (or directors). The interface operates the controllers (ordirectors) in such a way that they are transparent to the hostcomputer/server. That is, data is stored in, and retrieved from, thebank of disk drives in such a way that the host computer/server merelythinks it is operating with its own local disk drive. One such system isdescribed in U.S. Pat. 5,206,939, entitled “System and Method for DiskMapping and Data Retrieval”, inventors Moshe Yanai, Natan Vishlitzky,Bruno Alterescu and Daniel Castel, issued Apr. 27, 1993, and assigned tothe same assignee as the present invention.

As described in such U.S. Patent, the interface may also include, inaddition to the host computer/server controllers (or directors) and diskcontrollers (or directors), addressable cache memories. The cache memoryis a semiconductor memory and is provided to rapidly store data from thehost computer/server before storage in the disk drives, and, on theother hand, store data from the disk drives prior to being sent to thehost computer/server. The cache memory being a semiconductor memory, asdistinguished from a magnetic memory as in the case of the disk drives,is much faster than the disk drives in reading and writing data.

The host computer/server controllers, disk controllers and cache memoryare interconnected through a backplane printed circuit board. Moreparticularly, disk controllers are mounted on disk controller printedcircuit boards. The host computer/server controllers are mounted on hostcomputer/server controller printed circuit boards. And, cache memoriesare mounted on cache memory printed circuit boards. The disk directors,host computer/server directors, and cache memory printed circuit boardsplug into the backplane printed circuit board. In order to provide dataintegrity in case of a failure in a director, the backplane printedcircuit board has a pair of buses. One set the disk directors isconnected to one bus and another set of the disk directors is connectedto the other bus. Likewise, one set the host computer/server directorsis connected to one bus and another set of the host computer/serverdirectors is directors connected to the other bus. The cache memoriesare connected to both buses. Each one of the buses provides data,address and control information.

The arrangement is shown schematically in FIG. 1. Thus, the use of twobuses B1, B2 provides a degree of redundancy to protect against a totalsystem failure in the event that the controllers or disk drivesconnected to one bus, fail. Further, the use of two buses increases thedata transfer bandwidth of the system compared to a system having asingle bus. Thus, in operation, when the host computer/server 12 wishesto store data, the host computer 12 issues a write request to one of thefront-end directors 14 (i.e., host computer/server directors) to performa write command. One of the front-end directors 14 replies to therequest and asks the host computer 12 for the data. After the requesthas passed to the requesting one of the front-end directors 14, thedirector 14 determines the size of the data and reserves space in thecache memory 18 to store the request. The front-end director 14 thenproduces control signals on one of the address memory busses B1, B2connected to such front-end director 14 to enable the transfer to thecache memory 18. The host computer/server 12 then transfers the data tothe front-end director 14. The front-end director 14 then advises thehost computer/server 12 that the transfer is complete. The front-enddirector 14 looks up in a Table, not shown, stored in the cache memory18 to determine which one of the back-end directors 20 (i.e., diskdirectors) is to handle this request. The Table maps the hostcomputer/server 12 addresses into an address in the bank 14 of diskdrives. The front-end director 14 then puts a notification in a “mailbox” (not shown and stored in the cache memory 18) for the back-enddirector 20, which is to handle the request, the amount of the data andthe disk address for the data. Other back-end directors 20 poll thecache memory 18 when they are idle to check their “mail boxes”. If thepolled “mail box” indicates a transfer is to be made, the back-enddirector 20 processes the request, addresses the disk drive in the bank22, reads the data from the cache memory 18 and writes it into theaddresses of a disk drive in the bank 22.

When data is to be read from a disk drive in bank 22 to the hostcomputer/server 12 the system operates in a reciprocal manner. Moreparticularly, during a read operation, a read request is instituted bythe host computer/server 12 for data at specified memory locations(i.e., a requested data block). One of the front-end directors 14receives the read request and examines the cache memory 18 to determinewhether the requested data block is stored in the cache memory 18. Ifthe requested data block is in the cache memory 18, the requested datablock is read from the cache memory 18 and is sent to the hostcomputer/server 12. If the front-end director 14 determines that therequested data block is not in the cache memory 18 (i.e., a so-called“cache miss”) and the director 14 writes a note in the cache memory 18(i.e., the “mail box”) that it needs to receive the requested datablock. The back-end directors 20 poll the cache memory 18 to determinewhether there is an action to be taken (i.e., a read operation of therequested block of data). The one of the back-end directors 20 whichpoll the cache memory 18 mail box and detects a read operation reads therequested data block and initiates storage of such requested data blockstored in the cache memory 18. When the storage is completely writteninto the cache memory 18, a read complete indication is placed in the“mail box” in the cache memory 18. It is to be noted that the front-enddirectors 14 are polling the cache memory 18 for read completeindications. When one of the polling front-end directors 14 detects aread complete indication, such front-end director 14 completes thetransfer of the requested data which is now stored in the cache memory18 to the host computer/server 12.

The use of mailboxes and polling requires time to transfer data betweenthe host computer/server 12 and the bank 22 of disk drives thus reducingthe operating bandwidth of the interface.

SUMMARY OF THE INVENTION

In accordance with the present invention, a system interface isprovided. Such interface includes a plurality of first directors, aplurality of second directors, a data transfer section and a messagenetwork. The data transfer section includes a cache memory. The cachememory is coupled to the plurality of first and second directors. Themessaging network operates independently of the data transfer sectionand such network is coupled to the plurality of first directors and theplurality of second directors. The first and second directors controldata transfer between the first directors and the second directors inresponse to messages passing between the first directors and the seconddirectors through the messaging network to facilitate data transferbetween first directors and the second directors. The data passesthrough the cache memory in the data transfer section.

With such an arrangement, the cache memory in the data transfer sectionis not burdened with the task of transferring the director messaging butrather a messaging network is provided, operative independent of thedata transfer section, for such messaging thereby increasing theoperating bandwidth of the system interface.

In one embodiment of the invention, the system interface each one of thefirst directors includes a data pipe coupled between an input of suchone of the first directors and the cache memory and a controller fortransferring the messages between the message network and such one ofthe first directors.

In one embodiment each one of the second directors includes a data pipecoupled between an input of such one of the second directors and thecache memory and a controller for transferring the messages between themessage network and such one of the second directors.

In one embodiment the directors includes: a data pipe coupled between aninput of such one of the first directors and the cache memory; amicroprocessor; and a controller coupled to the microprocessor and thedata pipe for controlling the transfer of the messages between themessage network and such one of the first directors and for controllingthe data between the input of such one of the first directors and thecache memory.

In accordance with another feature of the invention, a data storagesystem is provided for transferring data between a host computer/serverand a bank of disk drives through a system interface. The systeminterface includes a plurality of first directors coupled to hostcomputer/server, a plurality of second directors coupled to the bank ofdisk drives, a data transfer section, and a message network. The datatransfer section includes a cache memory. The cache memory is coupled tothe plurality of first and second directors. The message network isoperative independently of the data transfer section and such network iscoupled to the plurality of first directors and the plurality of seconddirectors. The first and second directors control data transfer betweenthe host computer and the bank of disk drives in response to messagespassing between the first directors and the second directors through themessaging network to facilitate the data transfer between hostcomputer/server and the bank of disk drives with such data passingthrough the cache memory in the data transfer section.

In accordance with yet another embodiment, a method is provided foroperating a data storage system adapted to transfer data between a hostcomputer/server and a bank of disk drives. The method includestransferring messages through a messaging network with the data beingtransferred between the host computer/server and the bank of disk drivesthrough a cache memory, such message network being independent of thecache memory.

In accordance with another embodiment, a method is provided foroperating a data storage system adapted to transfer data between a hostcomputer/server and a bank of disk drives through a system interface.The interface includes a plurality of first directors coupled to hostcomputer/server, a plurality of second directors coupled to the bank ofdisk drives; and a data transfer section having a cache memory, suchcache memory being coupled to the plurality of first and seconddirectors. The method comprises transferring the data between the hostcomputer/server and the bank of disk drives under control of the firstand second directors in response to messages passing between the firstdirectors and the second directors through a messaging network tofacilitate the data transfer between host computer/server and the bankof disk drives with such data passing through the cache memory in thedata transfer section, such message network being independent of thecache memory.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other features of the invention will become more readilyapparent from the following detailed description when read together withthe accompanying drawings, in which:

FIG. 1 is a block diagram of a data storage system according to thePRIOR ART;

FIG. 2 is a block diagram of a data storage system according to theinvention;

FIG. 2A shows the fields of a descriptor used in the system interface ofthe data storage system of FIG. 2;

FIG. 2B shows the filed used in a MAC packet used in the systeminterface of the data storage system of FIG, 2;

FIG. 3 is a sketch of an electrical cabinet storing a system interfaceused in the data storage system of FIG. 2;

FIG. 4 is a diagramatical, isometric sketch showing printed circuitboards providing the system interface of the data storage system of FIG.2;

FIG. 5 is a block diagram of the system interface used in the datastorage system of FIG. 2;

FIG. 6 is a block diagram showing the connections between front-end andback-end directors to one of a pair of message network boards used inthe system interface of the data storage system of FIG. 2;

FIG. 7 is a block diagram of an exemplary one of the director boardsused in the system interface of he data storage system of FIG. 2;

FIG. 8 is a block diagram of the system interface used in the datastorage system of FIG. 2;

FIG. 8A is a diagram of an exemplary global cache memory board used inthe system interface of FIG. 8;

FIG. 8B is a diagram showing a pair of director boards coupled between apair of host processors and global cache memory boards used in thesystem interface of FIG. 8;

FIG. 8C is a block diagram of an exemplary crossbar switch used in thefront-end and rear-end directors of the system interface of FIG. 8;

FIG. 9 is a block diagram of a transmit Direct Memory Access (DMA) usedin the system interface of the FIG. 8;

FIG. 10 is a block diagram of a receive DMA used in the system interfaceof FIG.8;

FIG. 11 shows the relationship between FIGS. 11A and 11B, such FIGS. 11Aand 11B together showing a process flow diagram of the send operation ofa message network used in the system interface of FIG. 8;

FIGS. 11C-11E are examples of digital words used by the message networkin the system interface of FIG. 8;

FIG. 11F shows bits in a mask used in such message network,

FIG. 11G shows the result of the mask of FIG. 11F applied to the digitalword shown in FIG. 11E;

FIG. 12 shows the relationship between FIGS. 12A and 12B, such FIGS. 12Aand 12B Showing a process flow diagram of the receive operation of amessage network used in the system interface of FIG. 8;

FIG 13 shows the relationship between FIGS. 11A and 11B, such FIGS. 11Aand 11B together showing a process flow diagram of the acknowledgementoperation of a message network used in the system interface of FIG. 8;

FIGS. 14A and 14B show process flow diagrams of the transmit DMAoperation of the transmit DMA of FIG. 9; and

FIGS. 15A and 15B show process flow diagrams of the receive DMAoperation of the receive DMA of FIG. 10.

DETAILED DESCRIPTION

Referring now to FIG. 2, a data storage system 100 is shown fortransferring data between a host computer/server 120 and a bank of diskdrives 140 through a system interface 160. The system interface 160includes: a plurality of, here 32 front-end directors 180 ₁-180 ₃₂coupled to the host computer/server 120 via ports-123 ₃₂; a plurality ofback-end directors 200 ₁-200 ₃₂ coupled to the bank of disk drives 140via ports 123 ₃₃-123 ₆₄; a data transfer section 240, having a globalcache memory 220, coupled to the plurality of front-end directors 180₁-180 ₁₆ and the back-end directors 200 ₁-200 ₁₆; and a messagingnetwork 260, operative independently of the data transfer section 240,coupled to the plurality of front-end directors 180 ₁-180 ₃₂ and theplurality of back-end directors 200 ₁-200 ₃₂, as shown. The front-endand back-end directors 180 ₁-180 ₃₂, 200 ₁-200 ₃₂ are functionallysimilar and include a microprocessor (μP) 299 (i.e., a centralprocessing unit (CPU) and RAM), a message engine/CPU CPU controller 314and a data pipe 316 to be described in detail in connection with FIGS.5, 6 and 7. Suffice it to say here, however, that the front-end andback-end directors 180 ₁-180 ₃₂, 200 ₁-200 ₃₂ control data transferbetween the host computer/server 120 and the bank of disk drives 140 inresponse to messages passing between the directors 180 ₁-180 ₃₂, 200₁-200 ₃₂ through the messaging network 260. The messages facilitate thedata transfer between host computer/server 120 and the bank of diskdrives 140 with such data passing through the global cache memory 220via the data transfer section 240. More particularly, in the case of thefront-end directors 180 ₁-180 ₃₂, the data passes between the hostcomputer to the global cache memory 220 through the data pipe 316 in thefront-end directors 180 ₁-180 ₃₂ and the messages pass through themessage engine/CPU controller 314 in such front-end directors 180 ₁-180₃₂. In the case of the back-end directors 200 ₁-200 ₃₂ the data passesbetween the back-end directors 200 ₁-200 ₃₂ and the bank of disk drives140 and the global cache memory 220 through the data pipe 316 in theback-end directors 200 ₁-200 ₃₂ and again the messages pass through themessage engine/CPU controller 314 in such back-end director 200 ₁-200₃₂.

With such an arrangement, the cache memory 220 in the data transfersection 240 is not burdened with the task of transferring the directormessaging. Rather the messaging network 260 operates independent of thedata transfer section 240 thereby increasing the operating bandwidth ofthe system interface 160.

In operation, and considering first a read request by the hostcomputer/server 120 (i.e., the host computer/server 120 requests datafrom the bank of disk drives 140), the request is passed from one of aplurality of, here 32, host computer processors 121 ₁-121 ₃₂ in the hostcomputer 120 to one or more of the pair of the front-end directors 180₁-180 ₃₂ connected to such host computer processor 121 ₁-121 ₃₂. (It isnoted that in the host computer 120, each one of the host computerprocessors 121 ₁-121 ₃₂ is coupled to here a pair (but not limited to apair) of the front-end directors 180 ₁-180 ₃₂, to provide redundancy inthe event of a failure in one of the front end-directors 181 ₁-181 ₃₂coupled thereto. Likewise, the bank of disk drives 140 has a pluralityof, here 32, disk drives 141 ₁-141 ₃₂, each disk drive 141 ₁-141 ₃₂being coupled to here a pair (but not limited to a pair) of the back-enddirectors 200 ₁-200 ₃₂, to provide redundancy in the event of a failurein one of the back-end directors 200 ₁-200 ₃₂ coupled thereto). Eachfront-end director 180 ₁-180 ₃₂ includes a microprocessor (μP) 299(i.e., a central processing unit (CPU) and RAM) and will be described indetail in connection with FIGS. 5 and 7. Suffice it to say here,however, that the microprocessor 299 makes a request for the data fromthe global cache memory 220. The global cache memory 220 has a residentcache management table, not shown. Every director 180 ₁-180 ₃₂, 200₁-200 ₃₂ has access to the resident cache management table and everytime a front-end director 180 ₁-180 ₃₂ requests a data transfer, thefront-end director 180 ₁-180 ₃₂ must query the global cache memory 220to determine whether the requested data is in the global cache memory220. If the requested data is in the global cache memory 220 (i.e., aread “hit”), the front-end director 180 ₁-180 ₃₂, more particularly themicroprocessor 299 therein, mediates a DMA (Direct Memory Access)operation for the global cache memory 220 and the requested data istransferred to the requesting host computer processor 121 ₁-121 ₃₂.

If, on the other hand, the front-end director 180 ₁-180 ₃₂ receiving thedata request determines that the requested data is not in the globalcache memory 220 (i.e., a “miss”) as a result of a query of the cachemanagement table in the global cache memory 220, such front-end director180 ₁-180 ₃₂ concludes that the requested data is in the bank of diskdrives 140. Thus the front-end director 180 ₁-180 ₃₂ that received therequest for the data must make a request for the data from one of theback-end directors 200 ₁-200 ₃₂ in order for such back-end director 200₁-200 ₃₂ to request the data from the bank of disk drives 140. Themapping of which back-end directors 200 ₁-200 ₃₂ control which diskdrives 141 ₁-141 ₃₂ in the bank of disk drives 140 is determined duringa power-up initialization phase. The map is stored in the global cachememory 220. Thus, when the front-end director 180 ₁-180 ₃₂ makes arequest for data from the global cache memory 220 and determines thatthe requested data is not in the global cache memory 220 (i.e., a“miss”), the front-end director 180 ₁-180 ₃₂ is also advised by the mapin the global cache memory 220 of the back-end director 200 ₁-200 ₃₂responsible for the requested data in the bank of disk drives 140. Therequesting front-end director 180 ₁-180 ₃₂ then must make a request forthe data in the bank of disk drives 140 from the map designated back-enddirector 200 ₁-200 ₃₂. This request between the front-end director 180₁-180 ₃₂ and the appropriate one of the back-end directors 200 ₁-200 ₃₂(as determined by the map stored in the global cache memory 200) is by amessage which passes from the front-end director 180 ₁-180 ₃₂ throughthe message network 260 to the appropriate back-end director 200 ₁-200₃₂. It is noted then that the message does not pass through the globalcache memory 220 (i.e., does not pass through the data transfer section240) but rather passes through the separate, independent message network260. Thus, communication between the directors 180 ₁-180 ₃₂, 200 ₁-200₃₂ is through the message network 260 and not through the global cachememory 220. Consequently, valuable bandwidth for the global cache memory220 is not used for messaging among the directors 180 ₁-180 ₃₂, 200₁-200 ₃₂.

Thus, on a global cache memory 220 “read miss”, the front-end director180 ₁-180 ₃₂ sends a message to the appropriate one of the back-enddirectors 200 ₁-200 ₃₂ through the message network 260 to instruct suchback-end director 200 ₁-200 ₃₂ to transfer the requested data from thebank of disk drives 140 to the global cache memory 220. Whenaccomplished, the back-end director 200 ₁-200 ₃₂ advises the requestingfront-end director 180 ₁-180 ₃₂ that the transfer is accomplished by amessage, which passes from the back-end director 200 ₁-200 ₃₂ to thefront-end director 180 ₁-180 ₃₂ through the message network 260. Inresponse to the acknowledgement signal, the front-end director 180 ₁-180₃₂ is thereby advised that such front-end director 180 ₁-180 ₃₂ cantransfer the data from the global cache memory 220 to the requestinghost computer processor 121 ₁-121 ₃₂ as described above when there is acache “read hit”.

It should be noted that there might be one or more back-end directors200 ₁-200 ₃₂ responsible for the requested data. Thus, if only oneback-end director 200 ₁-200 ₃₂ is responsible for the requested data,the requesting front-end director 180 ₁-180 ₃₂ sends a uni-cast messagevia the message network 260 to only that specific one of the back-enddirectors 200 ₁-200 ₃₂. On the other hand, if more than one of theback-end directors 200 ₁-200 ₃₂ is responsible for the requested data, amulti-cast message (here implemented as a series of uni-cast messages)is sent by the requesting one of the front-end directors 180 ₁-180 ₃₂ toall of the back-end directors 200 ₁-200 ₃₂ having responsibility for therequested data. In any event, with both a uni-cast or multi-castmessage, such message is passed through the message network 260 and notthrough the data transfer section 240 (i.e., not through the globalcache memory 220).

Likewise, it should be noted that while one of the host computerprocessors 121 ₁-121 ₃₂ might request data, the acknowledgement signalmay be sent to the requesting host computer processor 121, or one ormore other host computer processors 121 ₁-121 ₃₂ via a multi-cast (i.e.,sequence of uni-cast) messages through the message network 260 tocomplete the data read operation.

Considering a write operation, the host computer 120 wishes to writedata into storage (i.e., into the bank of disk drives 140). One of thefront-end directors 180 ₁-180 ₃₂ receives the data from the hostcomputer 120 and writes it into the global cache memory 220. Thefront-end director 180 ₁-180 ₃₂ then requests the transfer of such dataafter some period of time when the back-end director 200 ₁-200 ₃₂determines that the data can be removed from such cache memory 220 andstored in the bank of disk drives 140. Before the transfer to the bankof disk drives 140, the data in the cache memory 220 is tagged with abit as “fresh data” (i.e., data which has not been transferred to thebank of disk drives 140, that is data which is “write pending”). Thus,if there are multiple write requests for the same memory location in theglobal cache memory 220 (e.g., a particular bank account) before beingtransferred to the bank of disk drives 140, the data is overwritten inthe cache memory 220 with the most recent data. Each time data istransferred to the global cache memory 220, the front-end director 180₁-180 ₃₂ controlling the transfer also informs the host computer 120that the transfer is complete to thereby free-up the host computer 120for other data transfers.

When it is time to transfer the data in the global cache memory 220 tothe bank of disk drives 140, as determined by the back-end director 200₁-200 ₃₂, the back-end director 200 ₁-200 ₃₂ transfers the data from theglobal cache memory 220 to the bank of disk drives 140 and resets thetag associated with data in the global cache memory 220 (i.e., un-tagsthe data) to indicate that the data in the global cache memory 220 hasbeen transferred to the bank of disk drives 140. It is noted that theun-tagged data in the global cache memory 220 remains there untiloverwritten with new data.

Referring now to FIGS. 3 and 4, the system interface 160 is shown toinclude an electrical cabinet 300 having stored therein: a plurality of,here eight front-end director boards 190 ₁-190 ₈, each one having herefour of the front-end directors 180 ₁-180 ₃₂; a plurality of, here eightback-end director boards 210 ₁-210 ₈, each one having here four of theback-end directors 200 ₁-200 ₃₂; and a plurality of, here eight, memoryboards 220′ which together make up the global cache memory 220. Theseboards plug into the front side of a backplane 302. (It is noted thatthe backplane 302 is a mid-plane printed circuit board). Plugged intothe backside of the backplane 302 are message network boards 304 ₁, 304₂. The backside of the backplane 302 has plugged into it adapter boards,not shown in FIGS. 2-4, which couple the boards plugged into theback-side of the backplane 302 with the computer 120 and the bank ofdisk drives 140 as shown in FIG. 2. That is, referring again briefly toFIG. 2, an I/O adapter, not shown, is coupled between each one of thefront-end directors 180 ₁-180 ₃₂ and the host computer 120 and an I/Oadapter, not shown, is coupled between each one of the back-enddirectors 200 ₁-200 ₃₂ and the bank of disk drives 140.

Referring now to FIG. 5, the system interface 160 is shown to includethe director boards 190 ₁₋ 190 ₈, 210 ₁-210 ₈ and the global cachememory 220 plugged into the backplane 302 and the disk drives 141 ₁-141₃₂ in the bank of disk drives along with the host computer 120 alsoplugged into the backplane 302 via I/O adapter boards, not shown. Themessage network 260 (FIG. 2) includes the message network boards 304 ₁and 304 ₂. Each one of the message network boards 304 ₁ and 304 ₂ isidentical in construction. A pair of message network boards 304 ₁ and304 ₂ is used for redundancy and for message load balancing. Thus, eachmessage network board 304 ₁, 304 ₂, includes a controller 306, (i.e., aninitialization and diagnostic processor comprising a CPU, systemcontroller interface and memory, as shown in FIG. 6 for one of themessage network boards 304 ₁, 304 _(2,) here board 304 ₁) and a crossbarswitch section 308 (e.g., a switching fabric made up of here fourswitches 308 ₁-308 ₄).

Referring again to FIG. 5, each one of the director boards 190 ₁-210 ₈includes, as noted above four of the directors 180 ₁-180 ₃₂, 200 ₁-200₃₂ (FIG. 2). It is noted that the director boards 190 ₁-190 ₈ havingfour front-end directors per board, 180 ₁₋ 180 ₃₂ are referred to asfront-end directors and the director boards 210 ₁-210 ₈ having fourback-end directors per board, 200 ₁-200 ₃₂ are referred to as back-enddirectors. Each one of the directors 180 ₁-180 ₃₂, 200 ₁-200 ₃₂ includesa CPU 310, a RAM 312 (which make up the microprocessor 299 referred toabove), the message engine/CPU controller 314, and the data pipe 316.

Each one of the director boards 190 ₁-210 ₈ includes a crossbar switch318. The crossbar switch 318 has four input/output ports 319, each onebeing coupled to the data pipe 316 of a corresponding one of the fourdirectors 180 ₁-180 ₃₂, 200 ₁-200 ₃₂ on the director board. 190 ₁-210 ₈.The crossbar switch 318 has eight output/input ports collectivelyidentified in FIG. 5 by numerical designation 321 (which plug into thebackplane 302. The crossbar switch 318 on the front-end director boards191 ₁-191 ₈ is used for coupling the data pipe 316 of a selected one ofthe four front-end directors 180 ₁-180 ₃₂ on the front-end directorboard 190 ₁-190 ₈ to the global cache memory 220 via the backplane 302and I/O adapter, not shown. The crossbar switch 318 on the back-enddirector boards 210 ₁-210 ₈ is used for coupling the data pipe 316 of aselected one of the four back-end directors 200 ₁-200 ₃₂ on the back-enddirector board 210 ₁-210 ₈ to the global cache memory 220 via thebackplane 302 and I/O adapter, not shown. Thus, referring to FIG. 2, thedata pipe 316 in the front-end directors 180 ₁-180 ₃₂ couples databetween the host computer 120 and the global cache memory 220 while thedata pipe 316 in the back-end directors 200 ₁-200 ₃₂ couples databetween the bank of disk drives 140 and the global cache memory 220. Itis noted that there are separate point-to-point data paths P₁-P₆₄ (FIG.2) between each one of the directors 180 ₁-180 ₃₂, 200 ₁-200 ₃₂ and theglobal cache memory 220. It is also noted that the backplane 302 is apassive backplane because it is made up of only etched conductors on oneor more layers of a printed circuit board. That is, the backplane 302does not have any active components.

Referring again to FIG. 5, each one of the director boards 190 ₁-210 ₈includes a crossbar switch 320. Each crossbar switch 320 has fourinput/output ports 323, each one of the four input/output ports 323being coupled to the message engine/CPU controller 314 of acorresponding one of the four directors 180 ₁-180 ₃₂, 200 ₁-200 ₃₂ onthe director board 190 ₁-210 ₈. Each crossbar switch 320 has a pair ofoutput/input ports 325 ₁, 325 ₂, which plug into the backplane 302. Eachport 325 ₁-325 ₂ is coupled to a corresponding one of the messagenetwork boards 304 ₁, 304 ₂, respectively, through the backplane 302.The crossbar switch 320 on the front-end director boards 190 ₁-190 ₈ isused to couple the messages between the message engine/CPUcontroller 314of a selected one of the four front-end directors 180 ₁-180 ₃₂ on thefront-end director boards 190 ₁-190 ₈ and the message network 260, FIG.2. Likewise, the back-end director boards 210 ₁-210 ₈ are used to couplethe messages produced by a selected one of the four back-end directors200 ₁-200 ₃₂ on the back-end director board 210 ₁-210 ₈ between themessage engine/CPU controller 314 of a selected one of such fourback-end directors and the message network 260 (FIG. 2). Thus, referringalso to FIG. 2, instead of having a separate dedicated message pathbetween each one of the directors 180 ₁-180 ₃₂, 200 ₁-200 ₃₂ and themessage network 260 (which would require M individual connections to thebackplane 302 for each of the directors, where M is an integer), hereonly M/4 individual connections are required). Thus, the total number ofconnections between the directors 180 ₁-180 ₃₂, 200 ₁-200 ₃₂ and thebackplane 302 is reduced to ¼th. Thus, it should be noted from FIGS. 2and 5 that the message network 260 (FIG. 2) includes the crossbar switch320 and the message network boards 304 ₁, 304 ₂.

Each message is a 64-byte descriptor, shown in FIG. 2A) which is createdby the CPU 310 (FIG. 5) under software control and is stored in a sendqueue in RAM 312. When the message is to be read from the send queue inRAM 312 and transmitted through the message network 260 (FIG. 2) to oneor more other directors via a DMA operation to be described, it ispacketized in the packetizer portion of packetizer/de-packetizer 428(FIG. 7) into a MAC type packet, shown in FIG. 2B, here using the NGIOprotocol specification. There are three types of packets: a messagepacket section; an acknowledgement packet; and a message network fabricmanagement packet, the latter being used to establish the messagenetwork routing during initialization (i.e., during power-up). Each oneof the MAC packets has: an 8-byte header which includes source (i.e.,transmitting director) and destination (i.e., receiving director)address; a payload; and terminates with a 4-byte Cyclic Redundancy Check(CRC), as shown in FIG. 2B. The acknowledgement packet (i.e., signal)has a 4-byte acknowledgment payload section. The message packet has a32-byte payload section. The Fabric Management Packet (FMP) has a256-byte payload section. The MAC packet is sent to the crossbar switch320. The destination portion of the packet is used to indicate thedestination for the message and is decoded by the switch 320 todetermine which port the message is to be routed. The decoding processuses a decoder table 327 in the switch 318, such table being initializedby controller during power-up by the initialization and diagnosticprocessor (controller) 306 (FIG. 5). The table 327 (FIG. 7) provides therelationship between the destination address portion of the MAC packet,which identifies the routing for the message and the one of the fourdirectors 180 ₁-180 ₃₂, 200 ₁-200 ₃₂ on the director board 190 ₁-190 ₈,210 ₁-210 ₈ or to one of the message network boards 304 ₁, 304 ₂ towhich the message is to be directed.

More particularly, and referring to FIG. 5, a pair of output/input ports325 ₁, 325 ₂ is provided for each one of the crossbar switches 320, eachone being coupled to a corresponding one of the pair of message networkboards 304 ₁, 304 ₂. Thus, each one of the message network boards 304 ₁,304 ₂ has sixteen input/output ports 322 ₁-322 ₁₆, each one beingcoupled to a corresponding one of the output/input ports 325 ₁, 325 ₂,respectively, of a corresponding one of the director boards 190 ₁-190 ₈,210 ₁-210 ₈ through the backplane 302, as shown. Thus, consideringexemplary message network board 304 ₁, FIG. 6, each switch 308 ₁-308 ₄also includes three coupling ports 324 ₁-324 ₃. The coupling ports 324₁-324 ₃ are used to interconnect the switches 322 ₁-322 ₄, as shown inFIG. 6. Thus, considering message network board 304 ₁, input/outputports 322 ₁-322 ₈ are coupled to output/input ports 325 ₁ of front-enddirector boards 190 ₁-190 ₈ and input/output ports 322 ₉-322 ₁₆ arecoupled to output/input ports 325 ₁ of back-end director boards 210₁-210 ₈, as shown. Likewise, considering message network board 304 ₂,input/output ports 322 ₁-322 ₈ thereof are coupled, via the backplane302, to output/input ports 325 ₂ of front-end director boards 190 ₁-190₈ and input/output ports 322 ₉-322 ₁₆ are coupled, via the backplane302, to output/input ports 325 ₂ of back-end director boards 210 ₁-210₈.

As noted above, each one of the message network boards 304 ₁, 304 ₂includes a processor 306 (FIG. 5) and a crossbar switch section 308having four switches 308 ₁-308 ₄, as shown in FIGS. 5 and 6. Theswitches 308 ₁-308 ₄ are interconnected as shown so that messages canpass between any pair of the input/output ports 322 ₁-322 ₁₆. Thus, itfollow that a message from any one of the front-end directors 180 ₁-180₃₂ can be coupled to another one of the front-end directors 180 ₁₋ 180₃₂ and/or to any one of the back-end directors 200 ₁-200 ₃₂. Likewise, amessage from any one of the back-end directors 180 ₁-180 ₃₂ can becoupled to another one of the back-end directors 180 ₁₋ 180 ₃₂ and/or toany one of the front-end directors 200 ₁-200 ₃₂.

As noted above, each MAC packet (FIG. 2B) includes in an addressdestination portion and a data payload portion. The MAC header is usedto indicate the destination for the MAC packet and such MAC header isdecoded by the switch to determine which port the MAC packet is to berouted. The decoding process uses a table in the switch 308 ₁-308 ₄,such table being initialized by processor 306 during power-up. The tableprovides the relationship between the MAC header, which identifies thedestination for the MAC packet and the route to be taken through themessage network. Thus, after initialization, the switches 320 and theswitches 308 ₁-308 ₄ in switch section 308 provides packet routing whichenables each one of the directors 180 ₁-180 ₃₂, 200 ₁-200 ₃₂ to transmita message between itself and any other one of the directors, regardlessof whether such other director is on the same director board 190 ₁-190₈, 210 ₁-210 ₈ or on a different director board. Further, the MAC packethas an additional bit B in the header thereof, as shown in FIG. 2B,which enables the message to pass through message network board 304 ₁ orthrough message network board 304 ₂. During normal operation, thisadditional bit B is toggled between a logic 1 and a logic 0 so that onemessage passes through one of the redundant message network boards 304₁, 304 ₂ and the next message to pass through the other one of themessage network boards 304 _(1,) 304 ₂ to balance the load requirementon the system. However, in the event of a failure in one of the messagenetwork boards 304 ₁, 304 ₂, the non-failed one of the boards 304 ₁, 304₂ is used exclusively until the failed message network board isreplaced.

Referring now to FIG. 7, an exemplary one of the director boards 190₁-190 ₈, 210 ₁-210 ₈, here director board 190 ₁ is shown to includedirectors 180 ₁, 180 ₃, 180 ₅ and 180 ₇. An exemplary one of thedirectors 180 ₁-180 ₄, here director 180 ₁ is shown in detail to includethe data pipe 316, the message engine/CPU controller 314, the RAM 312,and the CPU 310 all coupled to the CPU interface bus 317, as shown. Theexemplary director 180 ₁ also includes: a local cache memory 319 (whichis coupled to the CPU 310); the crossbar switch 318; and, the crossbarswitch 320, described briefly above in connection with FIGS. 5 and 6.The data pipe 316 includes a protocol translator 400, a quad port RAM402 and a quad port RAM controller 404 arranged as shown. Briefly, theprotocol translator 400 converts between the protocol of the hostcomputer 120, in the case of a front-end director 180 ₁-180 ₃₂, (andbetween the protocol used by the disk drives in bank 140 in the case ofa back-end director 200 ₁-200 ₃₂) and the protocol between the directors180 ₁-180 ₃, 200 ₁-200 ₃₂ and the global memory 220 (FIG. 2). Moreparticularly, the protocol used the host computer 120 may, for example,be fibre channel, SCSI, ESCON or FICON, for example, as determined bythe manufacture of the host computer 120 while the protocol usedinternal to the system interface 160 (FIG. 2) may be selected by themanufacturer of the interface 160. The quad port RAM 402 is a FIFOcontrolled by controller 404 because the rate data coming into the RAM402 may be different from the rate data leaving the RAM 402. The RAM 402has four ports, each adapted to handle an 18 bit digital word. Here, theprotocol translator 400 produces 36 bit digital words for the systeminterface 160 (FIG. 2) protocol, one 18 bit portion of the word iscoupled to one of a pair of the ports of the quad port RAM 402 and theother 18 bit portion of the word is coupled to the other one of the pairof the ports of the quad port RAM 402. The quad port RAM has a pair ofports 402A, 402B, each one of to ports 402A, 402B being adapted tohandle an 18 bit digital word. Each one of the ports 402A, 402B isindependently controllable and has independent, but arbitrated, accessto the memory array within the RAM 402. Data is transferred between theports 402A, 402B and the cache memory 220 (FIG. 2) through the crossbarswitch 318, as shown.

The crossbar switch 318 includes a pair of switches 406A, 406B. Each oneof the switches 406A, 406B includes four input/output director-sideports D₁-D₄ (collectively referred to above in connection with FIG. 5 asport 319) and four input/output memory-side ports M₁-M₄, M₅-M₈,respectively, as indicated. The input/output memory-side ports M₁-M₄,M₅-M₈ were collectively referred to above in connection with FIG. 5 asport 317). The director-side ports D₁-D₄ of switch 406A are connected tothe 402A ports of the quad port RAMs 402 in each one the directors 180₁, 180 ₃, 180 ₅ and 180 ₇, as indicated. Likewise, director-side portsof switch 406B are connected to the 402B ports of the quad port RAMs 402in each one the directors 180 ₁, 180 ₃, 180 ₅, and 180 ₇, as indicated.The ports D₁-D₄ are selectively coupled to the ports M₁-M₄ in accordancewith control words provided to the switch 406A by the controllers indirectors 180 ₁, 180 ₃, 180 ₅, 180 ₇ on busses R_(A1)-R_(A4),respectively, and the ports D₁-D₄ are coupled to ports M₅-M₈ inaccordance with the control words provided to switch 406B by thecontrollers in directors 180 ₁, 180 ₃, 180 ₅, 180 ₇ on bussesR_(B1)-R_(B4), as indicated. The signals on buses R_(A1)-R_(A4) arerequest signals. Thus, port 402A of any one of the directors 180 ₁, 180₃, 180 ₅, 180 ₇ may be coupled to any one of the ports M₁-M₄ of switch406A, selectively in accordance with the request signals on busesR_(A1)-R_(A4). Likewise, port 402B of any one of the directors 180 ₁-180₄ may be coupled to any one of the ports M₅-M₈ of switch 406B,selectively in accordance with the request signals on busesR_(B1)-R_(B4). The coupling between the director boards 190 ₁-190 ₈, 210₁-210 ₈ and the global cache memory 220 is shown in FIG. 8.

More particularly, and referring also to FIG. 2, as noted above, eachone of the host computer processors 121 ₁-121 ₃₂ in the host computer120 is coupled to a pair of the front-end directors 180 ₁-180 ₃₂, toprovide redundancy in the event of a failure in one of the frontend-directors 181 ₁-181 ₃₂ coupled thereto. Likewise, the bank of diskdrives 140 has a plurality of, here 32, disk drives 141 ₁-141 ₃₂, eachdisk drive 141 ₁-141 ₃₂ being coupled to a pair of the back-enddirectors 200 ₁-200 _(32,) to provide redundancy in the event of afailure in one of the back-end directors 200 ₁-200 ₃₂ coupled thereto).Thus, considering exemplary host computer processor121 ₁, such processor121 ₁ is coupled to a pair of front-end directors 180 ₁, 180 ₂. Thus, ifdirector 180 ₁ fails, the host computer processor 121 ₁ can still accessthe system interface160, albeit by the other front-end director 180 ₂.Thus, directors 180 ₁ and 180 ₂ are considered redundancy pairs ofdirectors. Likewise, other redundancy pairs of front-end directors are:front-end directors 180 ₃, 180 ₄; 180 ₅, 180 _(6;) 180 ₇, 180 _(8;) 180_(9,) 180 ₁₀; 180 ₁₁, 180 ₁₂; 180 ₁₃, 180 _(14;) 180 _(15,) 180 _(16;)180 ₁₇, 180 _(18;) 180 ₁₉, 180 ₂₀; 180 ₂₁, 180 _(22;) 180 ₂₃, 180 _(24;)180 ₂₅, 180 ₂₆; 180 ₂₇, 180 ₂₈; 180 ₂₉, 180 ₃₀; and 180 _(31,) 180 ₃₂(only directors 180 ₃₁ and 180 ₃₂ being shown in FIG. 2).

Likewise, disk drive 141 ₁ is coupled to a pair of back-end directors200 ₁, 200 ₂. Thus, if director 200 ₁ fails, the disk drive 141 ₁ canstill access the system interface 160, albeit by the other back-enddirector 180 ₂. Thus, directors 200 ₁ and 200 ₂ are consideredredundancy pairs of directors. Likewise, other redundancy pairs ofback-end directors are: back-end directors 200 ₃, 200 ₄; 200 ₅, 200_(6;) 200 _(7,) 200 _(8;) 200 _(9,) 200 ₁₀; 200 ₁₁, 200 ₁₂; 200 ₁₃, 200_(14;) 200 _(15,) 200 _(16;) 200 ₁₇, 200 _(18;) 200 ₁₉, 200 ₂₀; 200 ₂₁,200 _(22;) 200 ₂₃, 200 _(24;) 200 ₂₅, 200 ₂₆; 200 ₂₇, 200 ₂₈; 200 ₂₉,200 ₃₀; and 200 _(31,) 200 ₃₂ (only directors 200 ₃₁ and 200 ₃₂ beingshown in FIG. 2). Further, referring also to FIG. 8, the global cachememory 220 includes a plurality of, here eight, cache memory boards 220₁-220 ₈, as shown. Still further, referring to FIG. 8A, an exemplary oneof the cache memory boards, here board 220 ₁ is shown in detail and isdescribed in detail in U.S. Pat. No. 5,943,287 entitled “Fault TolerantMemory System”, John K. Walton, inventor, issued Aug. 24, 1999 andassigned to the same assignee as the present invention, the entiresubject matter therein being incorporated herein by reference. Thus, asshown in FIG. 8A, the board 220 ₁ includes a plurality of, here four RAMmemory arrays, each one of the arrays has a pair of redundant ports,i.e., an A port and a B port. The board itself has sixteen ports; a setof eight A ports M_(A1)-M_(A8) and a set of eight B ports M_(B1)-M_(B8).Four of the eight A port, here A ports M_(A1)-M_(A4) are coupled to theM₁ port of each of the front-end director boards 190 ₁, 190 ₃, 190 ₅,and 190 ₇, respectively, as indicated in FIG. 8. Four of the eight Bport, here B ports M_(B1)-M_(B4) are coupled to the M₁ port of each ofthe front-end director boards 190 ₂, 190 ₄, 190 ₆, and 190 ₈,respectively, as indicated in FIG. 8. The other four of the eight Aport, here A ports M_(A5)-M_(A8) are coupled to the M₁ port of each ofthe back-end director boards 210 ₁, 210 ₃, 210 ₅, and 210 ₇,respectively, as indicated in FIG. 8. The other four of the eight Bport, here B ports M_(B5)-M₄₈ are coupled to the M₁ port of each of theback-end director boards 210 ₂, 210 ₄, 210 ₆, and 210 ₈, respectively,as indicated in FIG. 8

Considering the exemplary four A ports M_(A1)-M_(A4), each one of thefour A ports M_(A1)-M_(A4) can be coupled to the A port of any one ofthe memory arrays through the logic network 221 _(1A). Thus, consideringport M_(A1), such port can be coupled to the A port of the four memoryarrays. Likewise, considering the four A ports M_(A5)-M_(A8), each oneof the four A ports M_(A5)-M_(A8) can be coupled to the A port of anyone of the memory arrays through the logic network 221 _(1B). Likewise,considering the four B ports M_(B1)-M_(B4), each one of the four B portsM_(B1)-M_(B4) can be coupled to the B port of any one of the memoryarrays through logic network 221 _(1B). Likewise, considering the four Bports M_(B5)-M_(B8), each one of the four B ports M_(B5)-M_(B8) can becoupled to the B port of any one of the memory arrays through the logicnetwork 221 _(2B). Thus, considering port M_(B1), such port can becoupled to the B port of the four memory arrays. Thus, there are twopaths data and control from either a front-end director 180 ₁-180 ₃₂ ora back-end director 200 ₁-200 ₃₂ can reach each one of the four memoryarrays on the memory board. Thus, there are eight sets of redundantports on a memory board, i.e., ports M_(A1), M_(B1); M_(A2), M_(B2);M_(A3), M_(B3); M_(A4), M_(B4); M_(A5), M_(B5); M_(A6), M_(B6); M_(A7),M_(B7); and M_(A8), M_(B8). Further, as noted above each one of thedirectors has a pair of redundant ports, i.e. a 402A port and a 402 Bport (FIG. 7). Thus, for each pair of redundant directors, the A port(i.e., port 402A) of one of the directors in the pair is connected toone of the pair of redundant memory ports and the B port (i.e., 402B) ofthe other one of the directors in such pair is connected to the otherone of the pair of redundant memory ports.

More particularly, referring to FIG. 8B, an exemplary pair of redundantdirectors is shown, here, for example, front-end director 180 ₁ andfront end-director 180 ₂. It is first noted that the directors 180 ₁,180 ₂ in each redundant pair of directors must be on different directorboards, here boards 190 ₁, 190 ₂, respectively. Thus, here front-enddirector boards 190 ₁-190 ₈ have thereon: front-end directors 180 ₁, 180₃, 180 ₅ and 180 ₇; front-end directors 180 ₂, 180 ₄, 180 ₆ and 180 ₈;front end directors 180 ₉, 180 ₁₁, 180 ₁₃ and 180 ₁₅; front enddirectors 180 ₁₀, 180 ₁₂, 180 ₁₄ and 180 ₁₆; front-end directors 180 ₁₇,180 ₁₉, 180 ₂₁ and 180 ₂₃; front-end directors 180 ₁₈, 180 ₂₀, 180 ₂₂and 180 ₂₄; front-end directors 180 ₂₅, 180 ₂₇, 180 ₂₉ and 180 ₃₁;front-end directors 180 ₁₈, 180 ₂₀, 180 ₂₂ and 180 _(24.) Thus, hereback-end director boards 210 ₁-210 ₈ have thereon: back-end directors200 _(1,) 200 ₃, 200 ₅ and 200 ₇; back-end directors 200 _(2,) 200 ₄,200 ₆ and 200 ₈; back-end directors 200 _(9,) 200 ₁₁, 200 ₁₃ and 200 ₁₅;back-end directors 200 _(10,) 200 ₁₂, 200 ₁₄ and 200 ₁₆; back-enddirectors 200 _(17,) 200 ₁₉, 200 ₂₁ and 200 ₂₃; back-end directors 200_(18,) 200 ₂₀, 200 ₂₂ and 200 ₂₄; back-end directors 200 _(25,) 200 ₂₇,200 ₂₉ and 200 ₃₁; back-end directors 200 _(18,) 200 ₂₀, 200 ₂₂ and 200₂₄;

Thus, here front-end director 180 ₁, shown in FIG. 8A, is on front-enddirector board 190 ₁ and its redundant front-end director 180 ₂, shownin FIG. 8B, is on anther front-end director board, here for example,front-end director board 190 ₂. As described above, the port 402A of thequad port RAM 402 (i.e., the A port referred to above) is connected toswitch 406A of crossbar switch 318 and the port 402B of the quad portRAM 402 (i.e., the B port referred to above) is connected to switch 406Bof crossbar switch 318. Likewise, for redundant director 180 _(2.)However, the ports M₁-M₄ of switch 406A of director 180 ₁ are connectedto the M_(A1) ports of global cache memory boards 220 ₁-200 ₄, as shown,while for its redundancy director 180 ₂, the ports M₁-M₄ of switch 406Aare connected to the redundant M_(B1) ports of global cache memoryboards 220 ₁-200 ₄, as shown.

Referring in more detail to the crossbar switch 318 (FIG. 7), as notedabove, each one of the director boards 190 ₁-210 ₈ has such a switch 318and such switch 318 includes a pair of switches 406A, 406B. Each one ofthe switches 406A, 406B is identical in construction, an exemplary onethereof, here switch 406A being shown in detail in FIG. 8C. Thus switch406A includes four input/output director-side ports D₁-D₄ as describedin connection with exemplary director board 190 ₁. Thus, for thedirector board 190 ₁ shown in FIG. 7, the four input/outputdirector-side ports D₁-D₄ of switch 406A are each coupled to the port402A of a corresponding one of the directors 180 ₁, 180 ₃, 180 ₅, and180 ₇ on the director board 190 ₁.

Referring again to FIG. 8C, the exemplary switch 406A includes aplurality of, here four, switch sections 430 ₁-430 ₄. Each one of theswitch sections 430 ₁-430 ₄ is identical in construction and is coupledbetween a corresponding one of the input/output director-side portsD₁-D₄ and a corresponding one of the output/input memory-side portsM₁-M₄, respectively, as shown. (It should be understood that theoutput/input memory-side ports of switch 406B (FIG. 7) are designated asports M₅-M₈, as shown. It should also be understood that while switch406A is responsive to request signals on busses R_(A1)-R_(A4) from quadport controller 404 in directors 180 ₁, 180 ₃, 180 ₅, 180 ₇ (FIG. 7),switch 406B is responsive in like manner to request signals on bussesR_(B1)-R_(B4) from controller 404 in directors 180 ₁, 180 ₃, 180 ₅ and180 ₇). More particularly, controller 404 of director 180 ₁ producesrequest signals on busses R_(A1) or R_(B1). In like manner, controller404 of director 180 ₃ produces request signals on busses R_(A2) orR_(B2), controller 404 of director 180 ₅ produces request signals onbusses R_(A3) or R_(B3), and controller 404 of director 180 ₇ producesrequest signals on busses R_(A4) or R_(B4).

Considering exemplary switch section 430 ₁, such switch section 403 ₁ isshown in FIG. 8C to include a FIFO 432 fed by the request signal on busR_(1A). (It should be understood that the FIFOs, not shown, in switchsections 430 ₂-430 ₄ are fed by request signals R_(A2)-RA₄,respectively). The switch section 406 ₁ also includes a requestgeneration 434, and arbiter 436, and selectors 442 and 446, all arrangedas shown. The data at the memory-side ports M₁-M₄ are on busses DM1-DM4are fed as inputs to selector 446. Also fed to selector 446 is a controlsignal produced by the request generator on bus 449 in response to therequest signal R_(A1) stored in FIFO 432. The control signal on bus 449indicates to the selector 446 the one of the memory-side ports M₁-M₄which is to be coupled to director-side port D₁. The other switchsections 430 ₂-430 ₄ operate in like manner with regard to director-sideports D₁-D₄, respectively and the memory-side ports M₁-M₄.

It is to be noted that the data portion of the word at port D₁ (i.e.,the word on bus DD1) is also coupled to the other switch sections 430₂-430 ₄. It is further noted that the data portion of the words at portsD₂-D₄ (i.e., the words on busses DD2-DD4, respectively), are fed to theswitch sections 430 ₁-430 ₄, as indicated. That is, each one of theswitch sections 430 ₁-430 ₄ has the data portion of the words on portsD₁-D₄ (i.e., busses DD1-DD4), as indicated. It is also noted that thedata portion of the word at port M₁ (i.e., the word on bus DM1) is alsocoupled to the other switch sections 430 ₂-430 ₄. It if further notedthat the data portion of the words at ports M₂-M₄ (i.e., the words onbusses DM2-DM4, respectively), are is fed to the switch sections 430₂-430 ₄, as indicated. That is, each one of the switch sections 430₁-430 ₄ has the data portion of the words on ports M₁-M₄ (i.e., bussesDM1-DM4), as indicated.

As will he described in more detail below, a request on bus R_(A1) toswitch section 430 ₁ is a request from the director 180 ₁ whichidentifies the one of the four ports M₁-M₄ in switch 430 ₁ is to becoupled to port 402A of director 180 ₁ (director side port D₁). Thus,port 402A of director 180 ₁ may be coupled to one of the memory sideports M₁-M₄ selectively in accordance with the data on bus R_(A1).Likewise, a request on buses R_(A2), R_(A3), R_(A4) to switch section430 ₂-430 ₄, respectively, are requests from the directors 180 ₃, 180 ₅,and 180 ₇, respectively, which identifies the one of the four portsM₁-M₄ in switch 430 ₁-430 ₄ is to be coupled to port 402A of directors180 ₃, 180 ₅ and 180 ₇, respectively.

More particularly, the requests R_(A1) are stored as they are producedby the quad port RAM controller 440 (FIG. 7) in receive FIFO 432. Therequest generator 434 receives from FIFO 432 the requests and determineswhich one of the four memory-side ports M₁-M₄ is to be coupled to port402A of director 180 ₁. These requests for memory-side ports M₁-M₄ areproduced on lines RA1,1-RA1,4, respectively. Thus, line RA1,1 (i.e., therequest for memory side port M₁) is fed to arbiter 436 and the requestsfrom switch sections 430 ₂-430 ₄ (which are coupled to port 402A ofdirectors 180 ₃, 180 ₅, and 180 ₇) on line RA2,1, RA3,1 and RA4,1,respectively are also fed to the arbiter 436, as indicated. The arbiter436 resolves multiple requests for memory-side port M₁ on a firstcome-first serve basis. The arbiter 436 then produces a control signalon bus 435 indicating the one of the directors 180 ₁, 180 ₃, 180 ₅ or180 ₇ which is to be coupled to memory-side port M₁.

The control signal on bus 435 is fed to selector 442. Also fed toselector 442 are the data portion of the data at port D₁, i.e., the dataon data bus DD1) along with the data portion of the data at ports D₂-D₄,i.e., the data on data busses DD2-DD4, respectively, as indicated. Thus,the control signal on bus 435 causes the selector 442 to couple to theoutput thereof the data busses DD1-DD4 from the one of the directors 180₁, 180 ₃, 180 _(5,) 180 ₇ being granted access to memory-side port M₁ bythe arbiter 436. The selected outputs of selector 442 is coupled tomemory-side port M₁. It should be noted that when the arbiter 436receives a request via the signals on lines RA1,1, RA2,1, RA3,1 andRA4,1, acknowledgements are returned by the arbiter 436 viaacknowledgement signals on line AK1,1, AK1,2, AK1,3, AK1,4, respectivelysuch signals being fed to the request generators 434 in switch section430 ₁, 430 ₂, 430 ₃, 430 ₄, respectively.

Thus, the data on any port D₁-D₄ can be coupled to and one of the portsM₁-M₄ to effectuate the point-to-point data paths P₁-P₆₄ described abovein connection with FIG. 2.

Referring again to FIG. 7, data from host computer 120 (FG. 2) ispresented to the system interface 160 (FIG. 2) in batches from many hostcomputer processors 121 ₁-121 ₃₂. Thus, the data from the host computerprocessors 121 ₁-121 ₃₂ are interleaved with each other as they arepresented to a director 180 ₁-180 ₃₂. The batch from each host computerprocessor 180 ₁-180 ₃₂ (i.e., source) is tagged by the protocoltranslator 400. More particularly by a Tacheon ASIC in the case of afibre channel connection. The controller 404 has a look-up table formedduring initialization. As the data comes into the protocol translator400 and is put into the quad port RAM 420 under the control ofcontroller 404, the protocol translator 400 informs the controller thatthe data is in the quad port RAM 420. The controller 404 looks at theconfiguration of its look-up table to determine the global cache memory220 location (e.g., cache memory board 220 ₁-220 ₈) the data is to bestored into. The controller 404 thus produces the request signals on theappropriate bus R_(A1), R_(B1), and then tells the quad port RAM 402that there is a block of data at a particular location in the quad portRAM 402, move it to the particular location in the global cache memory220. The crossbar switch 318 also takes a look at what other controllers404 in the directors 180 ₃, 180 ₅, and 180 ₇ on that particular directorboard 190 ₁ are asking by making request signal on busses R_(A2),R_(B2), R_(A3), R_(B3), R_(A4), R_(B4), respectively. The arbitration ofmultiple requests is handled by the arbiter 436 as described above inconnection with FIG. 8C.

Referring again to FIG. 7, the exemplary director 180 ₁ is shown toinclude in the message engine/CPU controller 314. The message engine/CPUcontroller 314 is contained in a field programmable gate array (FPGA).The message engine (ME) 315 is coupled to the CPU bus 317 and the DMAsection 408 as shown. The message engine (ME) 315 includes a DirectMemory Access (DMA) section 408, a message engine (ME) state machine410, a transmit buffer 424 and receive buffer 424, a MACpacketizer/depacketizer 428, send and receive pointer registers 420, anda parity generator 321. The DMA section 408 includes a DMA transmitter418, shown and to be described below in detail in connection with FIG.9, and a DMA receiver 424, shown and to be described below in detail inconnection with FIG. 10, each of which is coupled to the CPU businterface 317, as shown in FIG. 7. The message engine (ME) 315 includesa transmit data buffer 422 coupled to the DMA transmitter 418, a receivedata buffer 424 coupled to the DMA receiver 421, registers 420 coupledto the CPU bus 317 through an address decoder 401, thepacketizer/de-packetizer 428, described above, coupled to the transmitdata buffer 422, the receive data buffer 424 and the crossbar switch320, as shown, and a parity generator 321 coupled between the transmitdata buffer 422 and the crossbar switch 320. More particularly, thepacketizer portion 428P is used to packetize the message payload into aMAC packet (FIG. 2B) passing from the transmit data buffer 422 to thecrossbar switch 320 and the de-packetizer portion 428D is used tode-packetize the MAC packet into message payload data passing from thecrossbar switch 320 to the receive data buffer 424. The packetization ishere performed by a MAC core which builds a MAC packet and appends toeach message such things as a source and destination address designationindicating the director sending and receiving the message and a cyclicredundancy check (CRC), as described above. The message engine (ME) 315also includes: a receive write pointer 450, a receive read pointer 452;a send write pointer 454, and a send read pointer 456.

Referring now to FIGS. 11 and 12, the transmission of a message from adirector 180 ₁-180 ₃₂, 200 ₁-200 ₃₂ and the reception of a message by adirector 210 ₁-210 ₃₂, here exemplary director 180 ₁ shown in FIG. 7)will be described. Considering first transmission of a message,reference is made to FIGS. 7 and 11. First, as noted above, at power-upthe controller 306 (FIG. 5) of both message network boards 304 ₁, 304 ₂initialize the message routing mapping described above for the switches308 ₁-308 ₄ in switch section 308 and for the crossbar switches 320. Asnoted above, a request is made by the host computer 120. The request issent to the protocol translator 400. The protocol translator 400 sendsthe request to the microprocessor 299 via CPU bus 317 and buffer 301.When the CPU 310 (FIG. 7) in the microprocessor 299 of exemplarydirector 180 ₁ determines that a message is to be sent to another one ofthe directors 180 ₂-180 ₃₂, 200 ₁-200 ₃₂, (e.g., the CPU 310 determinesthat there has been a “miss” in the global cache memory 220 (FIG. 2) andwants to send a message to the appropriate one of the back-end directors200 ₁-200 ₃₂, as described above in connection with FIG. 2), the CPU 310builds a 64 byte descriptor (FIG. 2A) which includes a 32 byte messagepayload indicating the addresses of the batch of data to be read fromthe bank of disk drives 140 (FIG. 2) (Step 500) and a 32 byte commandfield (Step 510) which indicates the message destination via an 8-bytebit vector, i.e., the director, or directors, which are to receive themessage. An 8-byte portion of the command field indicates the directoror directors, which are to receive the message. That is, each one of the64 bits in the 8-byte portion corresponds to one of the 64 directors.Here, a logic 1 in a bit indicates that the corresponding director is toreceive a message and a logic 0 indicates that such correspondingdirector is not to receive the message. Thus, if the 8-byte word hasmore than one logic 1 bit more than one director will receive the samemessage. As will be described, the same message will not be sent inparallel to all such directors but rather the same message will be sentsequentially to all such directors. In any event, the resulting 64-bytedescriptor is generated by the CPU 310 (FIG. 7) (Step 512) is writteninto the RAM 312 (Step 514), as shown in FIG. 11.

More particularly, the RAM 512 includes a pair of queues; a send queueand a receive queue, as shown in FIG. 7. The RAM 312 is coupled to theCPU bus 317 through an Error Detection and Correction (EDAC)/Memorycontrol section 303, as shown. The CPU 310 then indicates to the messageengine (ME) 315 state machine 410 (FIG. 7) that a descriptor has beenwritten into the RAM 312. It should be noted that the message engine(ME) 315 also includes: a receive write pointer or counter 450, thereceive read pointer or counter 452, the send write pointer or counter454, and the send read pointer or counter 454, shown in FIG. 7. All fourpointers 450, 452, 454 and 456 are reset to zero on power-up. As is alsonoted above, the message engine/CPU controller 314 also includes: thede-packetizer portion 428D of packetizer/de-packetizer 428, coupled tothe receive data buffer 424 (FIG. 7) and a packetizer portion 428P ofthe packetizer/de-packetizer 428, coupled to the transmit data buffer422 (FIG. 7). Thus, referring again to FIG. 11, when the CPU 310indicates that a descriptor has been written into the RAM 312 and is nowready to be sent, the CPU 310 increments the send write pointer andsends it to the send write pointer register 454 via the register decoder401. Thus, the contents of the send write pointer register 454 indicatesthe number of messages in the send queue 312S of RAM 312, which have notbeen sent. The state machine 410 checks the send write pointer register454 and the send read pointer register 456, Step 518. As noted above,both the send write pointer register 454 and the send read pointerregister 456 are initially reset to zero during power-up. Thus, if thesend read pointer register 456 and the send write pointer register 454are different, the state machine knows that there is a message is in RAM312 and that such message is ready for transmission. If a message is tobe sent, the state machine 410 initiates a transfer of the stored64-byte descriptor to the message engine (ME) 315 via the DMAtransmitter 418, FIG. 7 (Steps 520, 522). The descriptor is sent fromthe send queues 312S in RAM 312 until the send read pointer 456 is equalto the send write pointer 454.

As described above in connection with Step 510, the CPU 310 generates adestination vector indicating the director, or directors, which are toreceive the message. As also indicated above the command field is32-bytes, eight bytes thereof having a bit representing a correspondingone of the 64 directors to receive the message. For example, referringto FIG. 11C, each of the bit positions 1-64 represents directors 180₁-180 ₃₂,200 ₁-200 ₃₁, respectively. Here, in this example, because alogic 1 is only in bit position 1, the eight-byte vector indicates thatthe destination director is only front-end director 108 ₁. In theexample in FIG. 11D, because a logic 1 is only in bit position 2, theeight-byte vector indicates that the destination director is onlyfront-end director 108 ₂. In the example in FIG. 11E, because a logic 1is more than one bit position, the destination for the message is tomore than one director, i.e., a multi-cast message. In the example inFIG. 11E, a logic 1 is only in bit positions 2, 3, 63 and 64. Thus, theeight-byte vector indicates that the destination directors are onlyfront-end director 108 ₂ and 180 ₃ and back-end directors 200 ₃₁ and 200₃₂. There is a mask vector stored in a register of register section 420(FIG. 7) in the message engine (ME) 315 which identifies director ordirectors which may be not available to use (e.g. a defective directoror a director not in the system at that time), Step 524, 525, for auni-cast transmission). If the message engine (ME) 315 state machine 410indicates that the director is available by examining the transmitvector mask (FIG. 11F) stored in register 420, the message engine (ME)315 encapsulates the message payload with a MAC header and CRC insidethe packetizer portion 428P, discussed above (Step 526). An example ofthe mask is shown in FIG. 11F. The mask has 64 bit positions, one foreach one of the directors. Thus, as with the destination vectorsdescribed above in connection with FIGS. 11C-11E, bit positions 1-64represents directors 180 ₁-180 _(32,) 200 ₁-200 ₃₂, respectively. Herein this example, a logic 1 in a bit position in the mask indicates thatthe representative director is available and a logic 0 in such bitposition indicates that the representative director is not available.Here, in the example shown in FIG. 11F, only director 200 ₃₂ isunavailable. Thus, if the message has a destination vector as indicatedin FIG. 11E, the destination vector, after passing through the mask ofFIG. 11F modifies the destination vector to that shown in FIG. 11G.Thus, director 200 ₃₂ will not receive the message. Such maskmodification to the destination vector is important because, as will bedescribed, the messages on a multi-cast are sent sequentially and not inparallel. Thus, elimination of message transmission to an unavailabledirector or directors increases the message transmission efficiency ofthe system.

Having packetized the message into a MAC packet via the packetizerportion of the packetizer/de-packetizer 428 (FIG. 7), the message engine(ME) 315 transfers the MAC packet to the crossbar switch 320 (Step 528)and the MAC packet is routed to the destination by the message network260 (Step 530) via message network boards 304 ₁, 304 ₂ or on the samedirector board via the crossbar switch 320 on such director board.

Referring to FIG. 12, the message read operation is described. Thus, inStep 600 the director waits for a message. When a message is received,the message engine (ME) 315 state machine 410 receives the packet (Step602). The state machine 410 checks the receive bit vector mask (FIG. 11stored in register 426) against the source address of the packet (Step604). If the state machine 410 determines that the message is from animproper source (i.e., a faulty director as indicated in the mask, FIG.11F, for example), the packet is discarded (Step 606). On the otherhand, if the state machine 410 determines that the packet is from aproper or valid director (i.e., source), the message engine (ME) 315de-encapsulates the message from the packet (Step 608) in de-packetizer428D. The state machine 410 in the message engine (ME) 315 initiates a32-byte payload transfer via the DMA receive operation (Step 610). TheDMA writes the 32 byte message to the memory receive queue 312R in theRAM 312 (Step 612). The message engine (ME) 315 state machine 410 thenincrements the receive write pointer register 450 (Step 614). The CPU310 then checks whether the receive write pointer 450 is equal to thereceive read pointer 452 (Step 616). If they are equal, such conditionindicates to the CPU 310 that a message has not been received (Step618). On the other hand, if the receive write pointer 450 and thereceive read pointer 452 are not equal, such condition indicates to theCPU 310 that a message has been received and the CPU 310 processes themessage in the receive queue 314R of RAM 312 and then the CPU 310increments the receive read pointer and writes it into the receive readpointer register 452. Thus, messages are stored in the receive queue312R of RAM 312 until the contents of the receive read pointer 452 andthe contents of the receive write pointer 450, which are initialized tozero during power-up, are equal.

Referring now to FIG. 13, the acknowledgement of a message operation isdescribed. In Step 700 the receive DMA engine 420 successfully completesa message transfer to the receive queue in RAM 312 (FIG. 7). The statemachine 410 in the message engine (ME) 315 generates an acknowledgementMAC packet and transmits the MAC packet to the sending director via themessage network 260 (FIG. 2) (Steps 702, 704). The message engine (ME)315 at the sending director de-encapsulates a 16 byte status payload inthe acknowledgement MAC packet and transfers such status payload via areceive DMA operation (Step 706). The DMA of the sending (i.e., source)director writes to a status field of the descriptor within the RAMmemory send queue 314S (Step 708). The state machine 410 of the messageengine (ME) 315 of the sending director (which received theacknowledgement message) increments its send read pointer 454 (Step712). The CPU 310 of the sending director (which received theacknowledgement message) processes the descriptor status and removes thedescriptor from the send queue 312S of RAM 312 (Step 714). It should benoted that the send and receive queues 312S and 312R are each circularqueues.

As noted above, the MAC packets are, during normal operation,transmitted alternatively to one of the pair of message network boards304 ₁, 304 ₂ by hardware a selector S in the crossbar switch 320. Theselector S is responsive to the bit B in the header of the MAC packet(FIG. 2B) and, when such bit B is one logic state the data is coupled toone of the message networks boards 402A and in response to the oppositelogic state the data is coupled to the other one of the message networksboards 402B. That is, when one message is transmitted to board 304 ₁ thenext message is transmitted to board 304 ₂.

Referring again to FIG. 9, the details of an exemplary transmit DMA 418is shown. As noted above, after a descriptor has been created by the CPU310 (FIG. 7) and is then stored in the RAM 312. If the send writepointer 450 (FIG. 7) and send read pointer 452, described above, havedifferent counts an indication is provided by the state machine 410 inthe message engine (ME) 315 (FIG. 7) that the created descriptor isavailable for DMA transmission to the message engine (ME) 315, thepayload off the descriptor is packetized into a MAC packet and sentthrough the message network 360 (FIG. 2) to one or more directors 180₁-180 ₃₂, 200 ₁-200 ₃₂. More particularly, the descriptor created by theCPU 310 is first stored in the local cache memory 319 and is latertransferred to the send queue 312S in RAM 312. When the send writepointer 450 and send read pointer 452 have different counts, the messageengine (ME) 315 state machine 410 initiates a DMA transmission asdiscussed above in connection with Step 520 (FIG. 11). Further, as notedabove, the descriptor resides in send queues 312R within the RAM 312.Further, as noted above, each descriptor which contains the message is afixed size, here 64-bytes. As each new, non-transmitted descriptor iscreated by the CPU 310, it is sequentially stored in a sequentiallocation, or address in the send queue 312S. Here, the address is a32-bit address.

When the transmit DMA is initiated, the state machine 410 in the messageengine (ME) 315 (FIG. 7), sends the queue address on bus 411 to anaddress register 413 in the DMA transmitter 418 (FIG. 9) along with atransmit write enable signal Tx_WE signal. The DMA transmitter 418requests the CPU bus 317 by asserting a signal on Xmit_Br. The CPU busarbiter 414 (FIG. 7) performs a bus arbitration and when appropriate thearbiter 414 grants the DMA transmitter 418 access to the CPU bus 317.The Xmit Cpu state machine 419 then places the address currentlyavailable in the address register 413 on the Address bus portion 317A ofCPU bus 317 by loading the output address register 403. Odd parity isgenerated by a Parity generator 405 before loading the output addressregister 403. The address in register 403 is placed on the CPU bus 317(FIG. 7) for RAM 312 send queue 312S, along with appropriate readcontrol signals via CPU bus 317 portion 317C. The data at the addressfrom the RAM 312 passes, via the data bus portion 317D of CPU bus 317,through a parity checker 415 to a data input register 417. The controlsignals from the CPU 310 are fed to a Xmit CPU state machine 419 via CPUbus 317 bus portion 317C. One of the control signals indicates whetherthe most recent copy of the requested descriptor is in the send queue312S of the RAM 312 or still resident in the local cache memory 319.That is, the most recent descriptor at any given address is first formedby the CPU 310 in the local cache memory 319 and is later transferred bythe CPU 310 to the queue in the RAM 312. Thus, there may be twodescriptors with the same address; one in the RAM 312 and one in thelocal cache memory 319 (FIG. 7), the most recent one being in the localcache memory 319. In either event, the transmit DMA 418 must obtain thedescriptor for DMA transmission from the RAM 312 and this descriptor isstored in the transmit buffer register 421 using signal 402 produced bythe state machine 419 to load these registers 421. The control signalfrom the CPU 310 to the Xmit CPU state machine 419 indicates whether themost recent descriptor is in the local cache memory 319. If the mostrecent descriptor is in the local cache memory 319, the Xmit CPU statemachine 419 inhibits the data that was just read from send queue 312S inthe RAM 312 and which has been stored in register 421 from passing toselector 423. In such case, state machine 419 must perform another datatransfer at the same address location. The most recent message is thentransferred by the CPU 310 from the local cache memory 319 to the sendqueue 312S in the RAM 312. The transmit message state machine 419 thenre-arbitrates for the CPU bus 317 and after it is granted such CPU bus317, the Xmit CPU state machine 419 then reads the descriptor from theRAM 312. This time, however, there the most recent descriptor isavailable in the send queue 312s in the RAM 312. The descriptor in theRAM 312 is now loaded into the transmit buffer register 421 in responseto the assertion of the signal 402 by the Xmit CPU state machine 419.The descriptor in the register 421 is then transferred through selector423 to message bus interface 409 under the control of a Xmit message(msg) state machine 427. That is, the descriptor in the transmit bufferregister 421 is transferred to the transmit data buffer 422 (FIG. 7)over the 32 bit transmit message bus interface 409 by the Xmit message(msg) state machine 427. The data in the transmit data buffer 422 (FIG.7) is packetized by the packetizer section of thepacketizer/de-packetizer 428 as described in Step 530 in FIG. 11.

More particularly, and referring also to FIG. 14A, the method ofoperating the transmit DMA 418 (FIG. 9) is shown. As noted above, eachdescriptor is 64-byte. Here, the transfer of the descriptor takes placeover two interfaces namely, the CPU bus 317 and the transmit messageinterface bus 409 (FIG. 7). The CPU bus 317 is 64 bits wide and eight,64-bit double-words constitute a 64-byte descriptor. The Xmit CPU statemachine 419 generates the control signals which result in the transferof the descriptor from the RAM 312 into the transmit buffer register 421(FIG. 7). The 64-byte descriptor is transferred in two 32-byte burstaccesses on the CPU bus 317. Each one of the eight double words isstored sequentially in the transmit buffer register 421 (FIG. 9). Thus,in Step 800, the message engine 315 state machine 410 loads the transmitDMA address register 413 with the address of the descriptor to betransmitted in the send queue 312S in RAM 312. This is done by theasserting the Tx_WE signal and this puts Xmit CPU state machine 419 instep 800, loads the address register 413 and proceeds to step 802. Instep 802, The Xmit Cpu state machine 419 loads the CPU transfer counter431 (FIG. 9) with a 32-byte count, which is 2. This is the number of 32byte transfers that would be required to transfer the 64-bytedescriptor, Step 802. The Xmit Cpu state machine 419 now proceeds toStep 804. In step 804, the transmit DMA state machine 419 checks thevalidity of the address that is loaded into its address register 413.The address loaded into the address register 413 is checked against thevalues loaded into the memory address registers 435. The memory addressregisters 435 contain the base address and the offset of the send queue312s in the RAM 312. The sum of the base address and the offset is therange of addresses for the send queue 312S in RAM 312. The address checkcircuitry 437 constantly checks whether the address in the addressregister 413 is with in the range of the send queue 312S in the RAM 312.If the address is found to be outside the range of the send queue 312Sthe transfer is aborted, this status is stored in the status register404 and then passed back to the message engine 315 state machine 410 inStep 416. The check for valid addresses is done in Step 805. If theaddress is within the range, i.e., valid, the transmit DMA state machine419 proceeds with the transfer and proceeds to Step 806. In the step806, the transmit DMA state machine 419 requests the CPU bus 317 byasserting the Xmit_BR signal to the arbiter 414 and then proceeds toStep 807. In Step 807, the Xmit Cpu state machine 419 constantly checksif it has been granted the bus by the arbiter. When the CPU bus 317 isgranted, the Xmit CPU state machine proceeds to Step 808. In Step 808,the Xmit Cpu state machine 419 generates an address and a data cyclewhich essentially reads 32-bytes of the descriptor from the send queue312S in the RAM 312 into its transmit buffer register 421. The Xmit Cpustate machine 419 now proceeds to step 810. In Step 810, the Xmit Cpustate machine 419 loads the descriptor that was read into its bufferregisters 421 and proceeds to Step 811. In Step 811, a check is made forany local cache memory 319 coherency errors (i.e., checks whether themost recent data is in the cache memory 319 and not in the RAM 312) onthese 32-bytes of data. If this data is detected to be resident in thelocal CPU cache memory 319, then the Xmit Cpu state machine 419 discardsthis data and proceeds to Step 806. The Xmit Cpu state machine 419 nowrequests for the CPU bus 317 again and when granted, transfers another32-bytes of data into the transmit buffer register 421, by which timethe CPU has already transferred the latest copy of the descriptor intothe RAM 312. In cases when the 32-bytes of the descriptor initiallyfetched from the RAM 312 was not resident in the local CPU cache memory319 (i.e., if no cache coherency errors were detected), the Xmit Cpustate machine 419 proceeds to Step 812. In Step 812, the Xmit CPU statemachine 419 decrements counters 431 and increments the address register413 so that such address register 413 points to the next address. TheXmit Cpu state machine then proceeds to step 814. When in Step 814, theTransmit CPU state machine 419 checks to see if the transfer counter 431has expired, i.e., counted to zero, if the count was found to benon-zero, it then, proceeds to Step 804 to start the transfer of thenext 32-bytes of the descriptor. In case the counter 431 is zero, theprocess goes to Step 816 to complete the transfer. The successfultransfer of the second 32-bytes of descriptor from the RAM 312 into thetransmit DMA buffer register 421 completes the transfer over the CPU bus317.

The message interface 409 is 32 bits wide and sixteen, 32 bit wordsconstitute a 64-byte descriptor. The 64-byte descriptor is transferredin batches of 32 bytes each. The Xmit msg state machine 427 controls andmanages the interface 409. The Xmit Cpu state machine asserts the signal433 to indicate that the first 32 bytes have been successfullytransferred over the CPU bus 317 (Step 818, FIG. 14B), this puts theXmit msg state machine into Step 818 and starts the transfer on themessage interface. In step 820, the Xmit msg machine 427 resetsburst/transfer counters 439 and initiates the transfer over the messageinterface 409. In Step 820, the transfer is initiated over the messageinterface 409 by asserting the “transfer valid” (TX_DATA_Valid) signalindicating to the message engine 315 state machine 410 that valid datais available on the bus 409. The transmit msg machine 427 transfers 32bits of data on every subsequent clock until its burst counter inburst/transfer counter 439 reaches a value equal to eight, Step 822. Theburst counter in burst/transfer counter 439 is incremented with each32-bit word put on the message bus 409 by a signal on line 433. When theburst count is eight, a check is made by the state machine 427 as towhether the transmit counter 431 has expired, i.e., is zero, Step 824.The expiry of the transfer counter in burst/transfer counter 439indicates the 64 byte descriptor has been transferred to the transmitbuffer 422 in message engine 315. If it has expired, the transmitmessage state machine 427 proceeds to Step 826. In step 826, the Xmitmsg state machine asserts the output End of Transfer (Tx_EOT) indicatingthe end of transfer over the message bus 409.In this state, after theassertion of the Tx_EOT signal the status of the transfer captured inthe status register 404 is sent to the message engine 315 state machine410. The DMA operation is complete with the descriptor being stored inthe transmit buffer 422 (FIG. 7).

On the other hand, if the transfer counter in burst/transfer counter 439has not expired, the process goes to Step 800 and repeats the abovedescribed procedure to transfer the 2^(nd) 32 bytes of descriptor data,at which time the transfer will be complete.

Referring now to FIG. 10, the receive DMA 420 is shown. Here, a messagereceived from another director is to be written into the RAM 312 (FIG.7). The receive DMA 420 is adapted to handle three types of information:error information which is 8 bytes in size; acknowledgement informationwhich is 16 bytes in size; and receive message payload and/or fabricmanagement information which is 32 byes in size. Referring also to FIG.7, the message engine 315 state machine 410 asserts the Rx_WE signal,indicating to the Receive DMA 420 that it is ready transfer the Data inits Rec buffer 426 FIG. 7. The data in the Receive buffer could be the8-byte error information, the 16-byte Acknowledgment information or the32-byte Fabric management/Receive message payload information. It placesa 2 bit encoded receive transfer count, on the Rx_transfer count signalindicating the type of information and an address which is the addresswhere this information is to be stored in the receive queue of RAM 312.In response to the receive write enable signal Rx_WE, the Receivemessage machine 450 (FIG. 10) loads the address into the addressregister 452 and the transfer count indicating the type of information,into the receive transfer counter 454. The address loaded into theaddress register 452 is checked by the address check circuitry 456 tosee if it is with in the range of the Receive queue addresses, in theRAM 312. This is done by checking the address against the values loadedinto the memory registers 457 (i.e., a base address register and anoffset register therein). The base address register contains the startaddress of the receive queue 312R residing in the RAM 312 and the offsetregister contains the size of this receive queue 312R in RAM 312.Therefore the additive sum of, the values stored in the base addressregister and the offset register specifies the range of addresses of thereceive queue in the RAM 312R. The memory registers 457 are loadedduring initialization. On the subsequent clock after the assertion ofthe Rx_WE signal, the message engine 315 state machine 410 the proceedsto place the data on a 32-bit message engine 315 data bus 407, FIG. 10.A Rx_data_valid signal accompanies each 32 bits of data, indicating thatthe data on the message engine data bus 407 is valid. In response tothis Rx_data_valid signal the receive message state machine 450 loadsthe data on the data bus into the receive buffer register 460. The endof the transfer over the message engine data bus 407 d is indicated bythe assertion of the Rx_EOT signal at which time the Receive messagestate machine 450 loads the last 32 bits of data on the message enginedata bus 407D of bus 407, into the receive buffer registers 460. Thissignals the end of the transfer over the message engine data bus 407Dportion of bus 407. At the end of such transfer is conveyed to theRx_Cpu state machine 462 by the assertion of the signal 464. The ReceiveCPU machine 462 now, requests for the CPU bus 317 by asserting thesignal REC_Br. After an arbitration by CPU bus arbiter 414 (FIG. 7) thereceive DMA 420 (FIG. 10) is given access to the CPU bus 317. TheReceive CPU state machine 462 proceeds to transfer the data in itsbuffer registers 424 over the CPU bus 317 into the Receive queue 312R inthe RAM 312. Simultaneously, this data is also transferred into aduplicate buffer register 466. The data at the output of the receivebuffer register 460 passes to one input of a selector 470 and alsopasses to a duplicate data receive buffer register 460. The output ofthe duplicate receive buffer register 466 is fed to a second input ofthe selector 470. As the data is being transferred by the Receive CPUstate machine 462, it is also checked for cache coherency errors. If thedata corresponding to the address being written into the RAM 312, islocated in the CPU's local cache memory 319 (FIG. 7), the receive DMAmachine 420 waits for the CPU 310 to copy the old data in its localcache memory 319 back to the receive queue 312R in the RAM 312 and thenoverwrites this old data with a copy of the new data from the duplicatebuffer register 466.

More particularly, if central processing unit 310 indicates to the DMAreceiver 420 that the data the receive buffer register 460 is availablein the local cache memory 319, the receive CPU state machine 462produces a select signal on line 463 which couples the data in theduplicate buffer register 466 to the output of selector 470 and then tothe bus 317 for store in the random access memory 312. The successfulwrite into the RAM 312 completes the DMA transfer. The receive DMA 420then signals the message engine 315 state machine 410 on the status ofthe transfer. The status of the transfer is captured in the statusregister 459.

Thus, with both the receive DMA and the transmit DMA, there is achecking of the local cache memory 319 to determine whether it has “old”data, in the case of the receive DMA or whether it has “new data” in thecase of the transmit DMA.

Referring now to FIG. 15A, the operation of the receive DMA 420 isshown. Thus, in Step 830 the Receive message machine 450 checks if thewrite enable signal Rx_WE is asserted. If found asserted, the receiveDMA 420 proceeds to load the address register 452 and the transfercounter 454. The value loaded into the transfer counter 454 determinesthe type of DMA transfer requested by the Message engine state machine310 in FIG. 7. The assertion of the Rx_WE signal starts the DMA receivetransfer operation. This puts the Rx msg state machine 450 in Step 832.In Step 832 the Rec msg state machine 450 loads the address register452, the transfer counter 454 and then proceeds to Step 834. In Step834, it checks to see if the Rx_DATA_VALID signal is asserted. Ifasserted it proceeds to step 836. The Rx msg state machine loads thebuffer register 460 (FIG. 10) in Step 836 with the data on the messageengine data bus 407D of bus 407 FIG. 10. The Rx_DATA_VALID signalaccompanies each piece of data put on the bus 407. The data issequentially loaded into the buffer registers 460 (FIG.10). The End ofthe transfer on the message engine data bus 407D of bus 407 is indicatedby the assertion of the Rx_EOT signal. When the Receive message statemachine 450 is in the End of transfer state Step 840 it signals theReceive CPU state machine 462 and this starts the transfer on the CPUbus 317 side.

The flow for the Receive CPU state machine is explained below. Thus,referring to FIG. 15B, the End of the transfer on the Message enginedata bus 407D portion of bus 407 starts the Receive CPU state machine462 and puts it in Step 842. The Receive CPU state machine 462 checksfor validity of the address in this state (Step 844). This is done bythe address check circuitry 456. If the address loaded in the addressregister 452 is outside the range of the receive queue 312R in the RAM312, the transfer is aborted and the status is captured in the Receivestatus register 459 and the Rec Cpu state machine 462 proceeds to Step845. On a valid address the Receive CPU state machine 462 goes to Step846. In Step 846 the Receive Cpu state machine 462 requests for accessof the CPU bus 317. It then proceeds to Step 848. In step 848 it checksfor a grant on the bus 317. On a qualified grant it proceeds to step850. In Step 850, The Rec Cpu state machine 462 performs an address anda data cycle, which essentially writes the data in the buffer registers460 into the receive queue 312R in RAM 312. Simultaneously with thewrite to the RAM 312, the data put on the CPU bus 317 is also loadedinto the duplicate buffer register 466. At same time, the CPU 310 alsoindicates on one of the control lines, if the data corresponding to theaddress written to in the RAM 312 is available in its local cache memory319. At the end of the address and data cycle the Rec Cpu state machine462 proceeds to Step 850. In this step it checks for cache coherencyerrors of the type described above in connection with the transmit DMA418 (FIG. 9). If cache coherency error is detected and the receive CPUstate machine 462 proceeds to Step 846 and retries the transaction moreparticularly, the Receive CPU state machine 462 now generates anotheraddress and data cycle to the previous address and this time the datafrom the duplicate buffer 466 is put on to the CPU data bus 317. Ifthere were no cache coherency errors the Receive CPU state machine 462proceeds to Step 852 where it decrements the transfer counter 454 andincrement the address in the address register 452. The Receive Cpu statemachine 462 then proceeds to Step 854. In Step 854, the state machine462 checks if the transfer counter has expired, i.e., is zero. On a nonzero transfer count the receive Cpu state machine 462 proceeds to Step844 and repeats the above described procedure until the transfer becomeszero. A zero transfer count when in step 854 completes the write intothe receive queue 312R in RAM 312 and the Rec Cpu state machine proceedsto 845. In step 845, it conveys status stored in the status registerback to status is conveyed to the message engine 315 state machine 410.

Referring again to FIG. 7, the interrupt control status register 412will be described in more detail. As described above, a packet is sentby the pocketsize portion of the packetizer/de-packetizer 428 to thecrossbar switch 320 for transmission to one or more of the directors. Itis to be noted that the packet sent by the packetizer portion of thepacketizer/de-packetizer 428 passes through a parity generator PG in themessage engine 315 prior to passing to the crossbar switch 320. Whensuch packet is sent by the message engine 315 in exemplary director 180₁, to the crossbar switch 320, a parity bit is added to the packet byparity bit generator PG prior to passing to the crossbar switch 320. Theparity of the packet is checked in the parity checker portion of aparity checker/generator (PG/C) in the crossbar switch 320. The resultof the check is sent by the PG/C in the crossbar switch 320 to theinterrupt control status register 412 in the director 180 _(1.)

Likewise, when a packet is transmitted from the crossbar switch 320 tothe message engine 315 of exemplary director 180 ₁, the packet passesthrough a parity generator portion of the parity checker/generator(PG/C) in the crossbar switch 320 prior to being transmitted to themessage engine 315 in director 180 ₁. The parity of the packet is thenchecked in the parity checker portion of the parity checker (PC) indirector 180 ₁ and is the result (i.e., status) is transmitted to thestatus register 412.

A number of embodiments of the invention have been described.Nevertheless, it will be understood that various modifications may bemade without departing from the spirit and scope of the invention.Accordingly, other embodiments are within the scope of the followingclaims.

What is claimed is:
 1. A data storage system for transferring databetween a host computer/server and a bank of disk drives through asystem interface, such system interface comprising: a plurality of firstdirectors coupled to the host computer/server; a plurality of seconddirectors coupled to the bank of disk drives; a data transfer sectioncoupled to the plurality of first directors and second directors; amessaging network coupled to the plurality of first directors and theplurality of second directors, such first and second directorscontrolling data transfer between the host computer and the bank of diskdrives in response to messages passing between the directors through themessaging network as such data passes through the data transfer section;and wherein each one of such messages transferred through the messagingnetwork is associated with a descriptor, such descriptor having acommand field indicating the one or ones of the directors which are toreceive such message, such command field having a plurality of bits,each bit being associated with a corresponding one of the directors, onelogic state of such bit indicating that such corresponding director isto receive the message and another logic state of such bit indicatingthat such corresponding director is not to receive such message.
 2. Thedata storage system recited in claim 1 wherein the message networktransmits each message sequentially to a plurality of the directors. 3.The data storage system recited in claim 2 wherein each one of thedirectors has a mask stored therein, such mask having a plurality ofbits, each one of such bits of the mask being associated with acorresponding one of the directors, each one of the bits indicating thean availability or unavailability of the corresponding one of thedirectors.
 4. The data storage system recited in claim 3 wherein themessage network compares the command field of a message to betransmitted with the mask and sequentially transmits the message to onlythose directors which are indicated by the mask as being available.
 5. Amethod for transferring data between a host computer/server and a bankof disk drives through a system interface, such system interfacecomprising: a plurality of first directors coupled to the hostcomputer/server; a plurality of second directors coupled to the bank ofdisk drives; a data transfer section coupled to the plurality of firstdirectors and second directors; and a messaging network coupled to theplurality of first directors and the plurality of second directors, suchfirst and second directors controlling data transfer between the hostcomputer and the bank of disk drives in response to messages passingbetween the directors through the messaging network as such data passesthrough the data transfer section; such method comprising: associatingwith each one of such messages transferred though the message network, adescriptor, such descriptor having a command field indicating the one orones of the directors which are to receive such message, such commandfield having a plurality of bits, each bit being associated with acorresponding one of the directors, one logic state of such bitindicating that such corresponding director is to receive the messageand another logic state of such bit indicating that such correspondingdirector is not to receive such message.
 6. The method recited in claim5 including transmitting each message sequentially to a plurality of thedirectors.
 7. The method recited in claim 6 including providing in eachone of the directors a mask stored therein, such mask having a pluralityof bits, each one of such bits of the mask being associated with acorresponding one of the directors, each one of the bits indicating thean availability or unavailability of the corresponding one of thedirectors.
 8. The data method recited in claim 7 wherein each one of thedirectors compares the command field for a message to be transmittedwith the mask and sequentially transmits the message to only thosedirectors which are indicated by the mask as being available.
 9. A datastorage system for transferring data between a host computer/server anda bank of disk drives through a system interface, such system interfacecomprising: a plurality of first directors coupled to the hostcomputer/server; a plurality of second directors coupled to the bank ofdisk drives; a cache memory; a data transfer section coupled to theplurality of first directors, the second directors, and the cachememory; a messaging network coupled to the plurality of first directorsand the plurality of second directors, such first and second directorscontrolling data transfer between the host computer and the bank of diskdrives in response to messages passing between the directors through themessaging network as such data passes through the cache memory via thedata transfer section; and wherein each one of such messages transferredthrough the messaging network is associated with a descriptor, suchdescriptor having a command field indicating the one or ones of thedirectors which are to receive such message, such command field having aplurality of bits, each bit being associated with a corresponding one ofthe directors, one logic state of such bit indicating that suchcorresponding director is to receive the message and another logic stateof such bit indicating that such corresponding director is not toreceive such message.
 10. The data storage system recited in claim 9wherein the message network transmits each message sequentially to aplurality of the directors.
 11. The data storage system recited in claim10 wherein each one of the directors has a mask stored therein, suchmask having a plurality of bits, each one of such bits of the mask beingassociated with a corresponding one of the directors, each one of thebits indicating the an availability or unavailability of thecorresponding one of the directors.
 12. The data storage system recitedin claim 11 wherein the message network compares the command field of amessage to be transmitted with the mask and sequentially transmits themessage to only those directors which are indicated by the mask as beingavailable.
 13. A method for transferring data between a hostcomputer/server and a bank of disk drives through a system interface,such system interface comprising: a plurality of first directors coupledto the host computer/server; a plurality of second directors coupled tothe bank of disk drives; a cache memory; a data transfer section coupledto the plurality of first directors, the second directors, and the cachememory; and a messaging network coupled to the plurality of firstdirectors and the plurality of second directors, such first and seconddirectors controlling data transfer between the host computer and thebank of disk drives in response to messages passing between thedirectors through the messaging network as such data passes through thecache memory via the data transfer section; such method comprising:associating with each one of such messages transferred though themessage network, a descriptor, such descriptor having a command fieldindicating the one or ones of the directors which are to receive suchmessage, such command field having a plurality of bits, each bit beingassociated with a corresponding one of the directors, one logic state ofsuch bit indicating that such corresponding director is to receive themessage and another logic state of such bit indicating that suchcorresponding director is not to receive such message.
 14. The methodrecited in claim 13 including transmitting each message sequentially toa plurality of the directors.
 15. The method recited in claim 14including providing in each one of the directors a mask stored therein,such mask having a plurality of bits, each one of such bits of the maskbeing associated with a corresponding one of the directors, each one ofthe bits indicating the an availability or unavailability of thecorresponding one of the directors.
 16. The data method recited in claim15 wherein each one of the directors compares the command field for amessage to be transmitted with the mask and sequentially transmits themessage to only those directors which are indicated by the mask as beingavailable.